Skip to content

Conversation

@GuyEshdat
Copy link
Contributor

@GuyEshdat GuyEshdat commented Aug 10, 2025

Summary by CodeRabbit

  • New Features

    • Added comprehensive support for the Dremio data warehouse across workflows, macros, and integration tests.
    • Introduced macros for Dremio-specific SQL operations, including datetime handling, reserved keyword escaping, table operations, and standard deviation calculations.
    • Enhanced integration tests with Dremio Docker Compose setup and dynamic schema/database property resolution.
    • Improved handling of reserved SQL keywords in queries and test data.
  • Bug Fixes

    • Corrected aliasing and escaping of reserved keywords in SQL queries to prevent syntax errors.
    • Refined anomaly score logic to handle single-value training sets.
  • Refactor

    • Updated test data column naming conventions for clarity and consistency.
    • Streamlined macro and test implementations for better cross-database compatibility.
  • Chores

    • Updated .gitignore to exclude additional internal directories.
    • Expanded workflow test matrices to include Dremio.

GuyEshdat and others added 30 commits August 4, 2025 16:22
* dremio support in anomaly tests - WIP

* git ignore dbt_internal_packages (relevant for fusion but no harm in adding now)

* get_package_database_and_schema - use non-views for dremio

* anomaly tests fixes

* remove space

* add newline

* dremio datediff - different implementation for seconds diff
…dremio schema names are dymainc (depends if model is table or not
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 11:28 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 11:28 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 11:34 — with GitHub Actions Inactive
Comment on lines +44 to +46
{% macro dremio__edr_day_of_week_expression(date_expr) %}
TO_CHAR({{ date_expr }}, 'DAY')
{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 10, 2025 17:52 — with GitHub Actions Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b2fddf and b66e737.

📒 Files selected for processing (2)
  • macros/utils/cross_db_utils/generate_elementary_profile_args.sql (1 hunks)
  • macros/utils/cross_db_utils/get_user_creation_query.sql (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: GuyEshdat
PR: elementary-data/dbt-data-reliability#838
File: macros/utils/table_operations/insert_rows.sql:156-0
Timestamp: 2025-08-10T11:28:43.591Z
Learning: In the Elementary dbt-data-reliability codebase, the `dremio__escape_special_chars` macro in `macros/utils/table_operations/insert_rows.sql` receives strings where single quotes are already escaped as `\'` from upstream processing. The macro correctly converts these to Dremio's SQL escaping format by replacing `\'` with `''`. This is different from other database implementations that handle bare single quotes directly.
📚 Learning: 2025-07-27T11:33:35.918Z
Learnt from: haritamar
PR: elementary-data/dbt-data-reliability#827
File: macros/utils/cross_db_utils/generate_elementary_profile_args.sql:30-31
Timestamp: 2025-07-27T11:33:35.918Z
Learning: In the Elementary dbt package, the Snowflake profile generation macro intentionally always includes private_key and private_key_passphrase parameters regardless of the authentication method. This is a strategic decision to deprecate password authentication and guide all new users toward keypair authentication, even if it means showing keypair parameters when they might not currently be needed.

Applied to files:

  • macros/utils/cross_db_utils/generate_elementary_profile_args.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (1.8.0, postgres) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_official, clickhouse) / test
🔇 Additional comments (2)
macros/utils/cross_db_utils/generate_elementary_profile_args.sql (2)

188-191: Confirm mapping of database/schema to Dremio source/folder.

This maps elementary_database -> object_storage and elementary_schema -> object_storage_path. Ensure elementary.get_package_database_and_schema() returns the Dremio source and folder path under Dremio, not conventional database/schema names.

If needed, I can adjust the Dremio dispatch of get_package_database_and_schema to return the correct pair.


186-193: No changes needed—Dremio Cloud fields are correct

The Dremio adapter for Cloud specifically exposes target.cloud_project_id and target.cloud_host (rather than generic project_id/host). In our macro we map those to the Elementary service parameter names "project_id" and "host", and the object storage parameters ("object_storage" and "object_storage_path") match the downstream profiling API. Everything aligns with both the Dremio adapter and Elementary’s expected arguments.

Comment on lines +184 to +194
{% macro dremio__generate_elementary_profile_args(method, elementary_database, elementary_schema) %}
{% do return([
_parameter("project_id", target.cloud_project_id),
_parameter("host", target.cloud_host),
_parameter("object_storage", elementary_database),
_parameter("object_storage_path", elementary_schema),
_parameter("user", target.user),
_parameter("token", "<TOKEN>"),
_parameter("threads", target.threads),
]) %}
{% endmacro %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Include "type" for consistency with other adapters.

All other adapter macros include _parameter("type", target.type). Add it to Dremio to keep the generated profile consistent and self-contained.

 {% macro dremio__generate_elementary_profile_args(method, elementary_database, elementary_schema) %}
   {% do return([
+    _parameter("type", target.type),
     _parameter("project_id", target.cloud_project_id),
     _parameter("host", target.cloud_host),
     _parameter("object_storage", elementary_database),
     _parameter("object_storage_path", elementary_schema),
     _parameter("user", target.user),
     _parameter("token", "<TOKEN>"),
     _parameter("threads", target.threads),
   ]) %}
 {% endmacro %}
🤖 Prompt for AI Agents
In macros/utils/cross_db_utils/generate_elementary_profile_args.sql around lines
184 to 194, the macro dremio__generate_elementary_profile_args is missing the
_parameter("type", target.type) entry that other adapter macros include. Add
_parameter("type", target.type) to the returned list to ensure consistency and
completeness of the generated profile arguments.

Comment on lines +109 to +114
{% macro dremio__get_user_creation_query(parameters) %}
CREATE USER "{{ parameters["user"] }}";

GRANT USAGE ON PROJECT TO USER "{{ parameters["user"] }}";
GRANT SELECT ON ALL DATASETS IN FOLDER {% for part in (parameters["object_storage"] ~ "." ~ parameters["object_storage_path"]).split(".") %}"{{ part }}"{% if not loop.last %}.{% endif %}{% endfor %} TO USER "{{ parameters["user"] }}";
{% endmacro %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden identifier quoting and handle empty path (avoid invalid FOLDER path + injection risks).

Current templating risks:

  • Unescaped double quotes in parameters["user"], ["object_storage"], or ["object_storage_path"] can break SQL.
  • If object_storage_path is empty, the split/join yields an empty quoted segment (""), producing invalid SQL.
  • When no path is provided, GRANT should likely target the source, not a folder.

Proposed fix replaces the block to safely escape quotes and support both source-only and folder paths.

-{% macro dremio__get_user_creation_query(parameters) %}
-CREATE USER "{{ parameters["user"] }}";
-
-GRANT USAGE ON PROJECT TO USER "{{ parameters["user"] }}";
-GRANT SELECT ON ALL DATASETS IN FOLDER {% for part in (parameters["object_storage"] ~ "." ~ parameters["object_storage_path"]).split(".") %}"{{ part }}"{% if not loop.last %}.{% endif %}{% endfor %} TO USER "{{ parameters["user"] }}";
-{% endmacro %}
+{% macro dremio__get_user_creation_query(parameters) %}
+{%- set user = parameters["user"] | replace('"', '""') -%}
+{%- set source = parameters["object_storage"] | replace('"', '""') -%}
+{%- set path = parameters.get("object_storage_path") -%}
+
+CREATE USER "{{ user }}";
+
+GRANT USAGE ON PROJECT TO USER "{{ user }}";
+{%- if path %}
+GRANT SELECT ON ALL DATASETS IN FOLDER "{{ source }}"{% for seg in path.split('.') %}."{{ seg | replace('"','""') }}"{% endfor %} TO USER "{{ user }}";
+{%- else %}
+GRANT SELECT ON ALL DATASETS IN "{{ source }}" TO USER "{{ user }}";
+{%- endif %}
+{% endmacro %}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% macro dremio__get_user_creation_query(parameters) %}
CREATE USER "{{ parameters["user"] }}";
GRANT USAGE ON PROJECT TO USER "{{ parameters["user"] }}";
GRANT SELECT ON ALL DATASETS IN FOLDER {% for part in (parameters["object_storage"] ~ "." ~ parameters["object_storage_path"]).split(".") %}"{{ part }}"{% if not loop.last %}.{% endif %}{% endfor %} TO USER "{{ parameters["user"] }}";
{% endmacro %}
{% macro dremio__get_user_creation_query(parameters) %}
{%- set user = parameters["user"] | replace('"', '""') -%}
{%- set source = parameters["object_storage"] | replace('"', '""') -%}
{%- set path = parameters.get("object_storage_path") -%}
CREATE USER "{{ user }}";
GRANT USAGE ON PROJECT TO USER "{{ user }}";
{%- if path %}
GRANT SELECT ON ALL DATASETS IN FOLDER "{{ source }}"{% for seg in path.split('.') %}."{{ seg | replace('"','""') }}"{% endfor %} TO USER "{{ user }}";
{%- else %}
GRANT SELECT ON ALL DATASETS IN "{{ source }}" TO USER "{{ user }}";
{%- endif %}
{% endmacro %}
🤖 Prompt for AI Agents
In macros/utils/cross_db_utils/get_user_creation_query.sql around lines 109 to
114, the current macro does not properly escape double quotes in
parameters["user"], ["object_storage"], and ["object_storage_path"], which can
cause SQL injection or syntax errors. Also, if object_storage_path is empty, it
generates invalid SQL by including an empty quoted segment. To fix this,
implement proper escaping of double quotes in all identifiers and add logic to
check if object_storage_path is empty; if so, grant permissions on the source
only instead of a folder path. Adjust the templating to handle both cases
safely.

Comment on lines +112 to +114
GRANT USAGE ON PROJECT TO USER "{{ parameters["user"] }}";
GRANT SELECT ON ALL DATASETS IN FOLDER {% for part in (parameters["object_storage"] ~ "." ~ parameters["object_storage_path"]).split(".") %}"{{ part }}"{% if not loop.last %}.{% endif %}{% endfor %} TO USER "{{ parameters["user"] }}";
{% endmacro %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm Dremio SQL support and exact privilege syntax.

Validate that the following constructs are supported in your Dremio environment:

  • CREATE USER "";
  • GRANT USAGE ON PROJECT TO USER "";
  • GRANT SELECT ON ALL DATASETS IN FOLDER ... TO USER "";
  • When no folder path is provided, GRANT SELECT ON ALL DATASETS IN ""

If syntax differs between Dremio Cloud and Software, we may need conditional generation.


🌐 Web query:

Verify Dremio SQL support for:
1) Creating users via SQL (CREATE USER) in Dremio Cloud/Software and the exact syntax.
2) Granting USAGE ON PROJECT to a USER and the exact syntax.
3) Granting SELECT ON ALL DATASETS IN FOLDER "<source>"."<path>" and the alternative for source-level grants (without folder).
Note any differences between Dremio Cloud vs Software.

💡 Result:

Dremio provides SQL commands for user management and access control, with certain functionalities varying between Dremio Cloud and Dremio Software. Below are the details addressing your queries:

1) Creating Users via SQL (CREATE USER) in Dremio Cloud/Software and the Exact Syntax

  • Dremio Cloud: The CREATE USER command is supported, allowing administrators to create new users directly through SQL.

    Syntax:

  CREATE USER "<username>";

Example:

  CREATE USER "[email protected]";

Note: The username must be enclosed in double quotes. (docs.dremio.com)

  • Dremio Software: The CREATE USER command is not supported. User management is typically performed through the Dremio Console or by using the Dremio REST API. (docs.dremio.com)

2) Granting USAGE ON PROJECT to a USER and the Exact Syntax

In Dremio, the USAGE privilege on a project allows a user to access and interact with the project's contents.

Syntax:

GRANT USAGE ON PROJECT "<project_name>" TO USER "<username>";

Example:

GRANT USAGE ON PROJECT "Finance" TO USER "[email protected]";

Note: Both project and username are enclosed in double quotes. (docs.dremio.com)

3) Granting SELECT ON ALL DATASETS IN FOLDER ""."" and the Alternative for Source-Level Grants (Without Folder)

Dremio allows granting privileges on datasets within a specific folder or source.

  • Granting SELECT on All Datasets in a Folder:

    Syntax:

  GRANT SELECT ON ALL DATASETS IN FOLDER "<folder_path>" TO USER "<username>";

Example:

  GRANT SELECT ON ALL DATASETS IN FOLDER "Sales/2025" TO USER "[email protected]";

Note: The folder path and username are enclosed in double quotes. (docs.dremio.com)

  • Granting SELECT on All Datasets in a Source:

    Syntax:

  GRANT SELECT ON ALL DATASETS IN SOURCE "<source_name>" TO USER "<username>";

Example:

  GRANT SELECT ON ALL DATASETS IN SOURCE "SalesData" TO USER "[email protected]";

Note: The source name and username are enclosed in double quotes. (docs.dremio.com)

Differences Between Dremio Cloud and Dremio Software:

  • User Creation: Dremio Cloud supports the CREATE USER SQL command, whereas Dremio Software requires user management through the Dremio Console or REST API.

  • Granting Privileges: The SQL syntax for granting privileges, such as USAGE on projects and SELECT on datasets, is consistent across both Dremio Cloud and Dremio Software.

For comprehensive details on user management and access control in Dremio, refer to the official documentation:

Note: Always ensure that the usernames and object names are correctly specified and enclosed in double quotes as required by Dremio's SQL syntax.

Citations:


Update Dremio SQL syntax and handle Cloud vs Software differences
Target: macros/utils/cross_db_utils/get_user_creation_query.sql (lines 112–114)

– In Dremio Cloud only, you can create users via
CREATE USER "<username>";
Dremio Software does not support CREATE USER in SQL—you must call the REST API or use the UI. Add a flag or document this difference.

GRANT USAGE must include the project name:
Replace
sql GRANT USAGE ON PROJECT TO USER "{{ parameters.user }}";
with
sql GRANT USAGE ON PROJECT "{{ parameters.project }}" TO USER "{{ parameters.user }}";

GRANT SELECT ON ALL DATASETS IN FOLDER expects a single quoted, slash-separated path, not a series of dot-quoted identifiers. For example:
sql GRANT SELECT ON ALL DATASETS IN FOLDER "{{ parameters.object_storage }}/{{ parameters.object_storage_path }}" TO USER "{{ parameters.user }}";

– When there’s no folder path, use a source-level grant:
sql GRANT SELECT ON ALL DATASETS IN SOURCE "{{ parameters.object_storage }}" TO USER "{{ parameters.user }}";

Please update the macro to reflect these exact syntaxes and add conditional logic for Dremio Software’s lack of SQL user-creation.

🤖 Prompt for AI Agents
In macros/utils/cross_db_utils/get_user_creation_query.sql around lines 112 to
114, update the GRANT USAGE statement to include the project name by changing it
to GRANT USAGE ON PROJECT "{{ parameters.project }}" TO USER "{{ parameters.user
}}". Replace the GRANT SELECT ON ALL DATASETS IN FOLDER statement to use a
single quoted, slash-separated path "{{ parameters.object_storage }}/{{
parameters.object_storage_path }}" instead of dot-quoted identifiers. Add
conditional logic to grant SELECT ON ALL DATASETS IN SOURCE "{{
parameters.object_storage }}" TO USER "{{ parameters.user }}" when there is no
folder path. Also, add a flag or conditional to only include CREATE USER
"<username>"; in Dremio Cloud, omitting it for Dremio Software since it does not
support SQL user creation.

@GuyEshdat GuyEshdat merged commit 988cf3a into master Aug 11, 2025
14 checks passed
@GuyEshdat GuyEshdat deleted the ele-4874-dremio branch August 11, 2025 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants